在本文中,我们考虑发现非凸锥优化的近似二阶固定点(SOSP),该点在仿射子空间和凸锥的交点上最小化了两倍的可微分函数。特别是,我们提出了一个基于牛顿 - 偶联的梯度(牛顿-CG)的障碍方法,用于查找$(\ epsilon,\ sqrt {\ epsilon})$ - 此问题的SOSP。我们的方法不仅可以实现,而且还达到了$ {\ cal o}(\ epsilon^{ - 3/2})$的迭代复杂性,它匹配找到$的二阶方法的最著名迭代复杂性(以找到$(\ epsilon,\ sqrt {\ epsilon})$ - 无约束的非convex优化的sosp。$ \ widetilde {\ cal o}的操作复杂性(\ epsilon^{ - 3/2} \ min \ {也是为我们的方法建立的。
translated by 谷歌翻译
在本文中,我们开发了使用局部Lipschitz连续梯度(LLCG)的凸优化的一阶方法,该方法超出了lipschitz连续梯度的精心研究类别的凸优化。特别是,我们首先考虑使用LLCG进行无约束的凸优化,并提出求解它的加速近端梯度(APG)方法。所提出的APG方法配备了可验证的终止标准,并享受$ {\ cal o}的操作复杂性(\ varepsilon^{ - 1/2} \ log \ log \ varepsilon^{ - 1})$和$ {\ cal o {\ cal o }(\ log \ varepsilon^{ - 1})$用于查找不受约束的凸的$ \ varepsilon $ - 剩余凸和强烈凸优化问题的解决方案。然后,我们考虑使用LLCG进行约束的凸优化,并提出了一种近端增强拉格朗日方法,通过应用我们提出的APG方法之一来求解一系列近端增强拉格朗日子问题,以解决它。所得的方法配备了可验证的终止标准,并享受$ {\ cal o}的操作复杂性(\ varepsilon^{ - 1} \ log \ log \ varepsilon^{ - 1})$和$ {\ cal o}(\ cal o}(\ Varepsilon^{ - 1/2} \ log \ varepsilon^{ - 1})$用于查找约束凸的$ \ varepsilon $ -KKT解决方案,分别是强烈的凸优化问题。本文中所有提出的方法均无参数或几乎不含参数,但需要有关凸电参数的知识。据我们所知,没有进行先前的研究来研究具有复杂性保证的加速一阶方法,可与LLCG进行凸优化。本文获得的所有复杂性结果都是全新的。
translated by 谷歌翻译
在本文中,我们考虑了一类结构化单调包含(MI)问题,这些问题包括在两个单调算子的总和中找到零,其中一个是最大单调的,而另一个是局部的lipchitz。特别是,我们首先提出了一种原始的偶尔外推(PDE)方法,用于通过使用点和操作器外推技术来修改经典前进的分裂方法,以解决结构化的强烈MI问题,其中参数通过回溯进行自适应更新线搜索方案。所提出的PDE方法几乎不含参数,配备了可验证的终止标准,并且享受$ {\ cal o}的操作复杂性(\ log \ log \ epsilon^{ - 1})$,通过组成的基本操作量来衡量仅对另一个操作员的一个操作员和解决方案进行评估,以找到结构化强烈MI问题的$ \ epsilon $ risiDual解决方案。然后,我们提出了另一种PDE方法,用于通过应用上述PDE方法近似求解一系列结构化的强烈MI问题来解决结构化的非额外MI问题。所得的PDE方法是无参数的,配备了可验证的终止标准,并享受$ {\ cal o}的操作复杂性(\ epsilon^{ - 1} \ log \ log \ epsilon^{ - 1})$ $ \ epsilon $ - 累积的非紧张MI问题的解决方案。结果,我们将后者的PDE方法应用于圆锥圆锥优化,锥形约束鞍点和变异不平等问题,并获得复杂性结果,以找到$ \ epsilon $ -KKT或$ \ epsilon $ - epsilon $ - 水分$ - 局部的解决方案。 Lipschitz的连续性。据我们所知,尚未进行先前的研究来调查具有复杂性保证解决本地Lipschitz连续性下述问题的方法。本文获得的所有复杂性结果都是全新的。
translated by 谷歌翻译
In this chapter, we review and discuss the transformation of AI technology in HCI/UX work and assess how AI technology will change how we do the work. We first discuss how AI can be used to enhance the result of user research and design evaluation. We then discuss how AI technology can be used to enhance HCI/UX design. Finally, we discuss how AI-enabled capabilities can improve UX when users interact with computing systems, applications, and services.
translated by 谷歌翻译
An increasing number of public datasets have shown a marked clinical impact on assessing anatomical structures. However, each of the datasets is small, partially labeled, and rarely investigates severe tumor subjects. Moreover, current models are limited to segmenting specific organs/tumors, which can not be extended to novel domains and classes. To tackle these limitations, we introduce embedding learned from Contrastive Language-Image Pre-training (CLIP) to segmentation models, dubbed the CLIP-Driven Universal Model. The Universal Model can better segment 25 organs and 6 types of tumors by exploiting the semantic relationship between abdominal structures. The model is developed from an assembly of 14 datasets with 3,410 CT scans and evaluated on 6,162 external CT scans from 3 datasets. We rank first on the public leaderboard of the Medical Segmentation Decathlon (MSD) and achieve the state-of-the-art results on Beyond The Cranial Vault (BTCV). Compared with dataset-specific models, the Universal Model is computationally more efficient (6x faster), generalizes better to CT scans from varying sites, and shows stronger transfer learning performance on novel tasks. The design of CLIP embedding enables the Universal Model to be easily extended to new classes without catastrophically forgetting the previously learned classes.
translated by 谷歌翻译
Recent advances in self-supervised learning (SSL) in computer vision are primarily comparative, whose goal is to preserve invariant and discriminative semantics in latent representations by comparing siamese image views. However, the preserved high-level semantics do not contain enough local information, which is vital in medical image analysis (e.g., image-based diagnosis and tumor segmentation). To mitigate the locality problem of comparative SSL, we propose to incorporate the task of pixel restoration for explicitly encoding more pixel-level information into high-level semantics. We also address the preservation of scale information, a powerful tool in aiding image understanding but has not drawn much attention in SSL. The resulting framework can be formulated as a multi-task optimization problem on the feature pyramid. Specifically, we conduct multi-scale pixel restoration and siamese feature comparison in the pyramid. In addition, we propose non-skip U-Net to build the feature pyramid and develop sub-crop to replace multi-crop in 3D medical imaging. The proposed unified SSL framework (PCRLv2) surpasses its self-supervised counterparts on various tasks, including brain tumor segmentation (BraTS 2018), chest pathology identification (ChestX-ray, CheXpert), pulmonary nodule detection (LUNA), and abdominal organ segmentation (LiTS), sometimes outperforming them by large margins with limited annotations.
translated by 谷歌翻译
We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io
translated by 谷歌翻译
Feature selection helps reduce data acquisition costs in ML, but the standard approach is to train models with static feature subsets. Here, we consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information. DFS is often addressed with reinforcement learning (RL), but we explore a simpler approach of greedily selecting features based on their conditional mutual information. This method is theoretically appealing but requires oracle access to the data distribution, so we develop a learning approach based on amortized optimization. The proposed method is shown to recover the greedy policy when trained to optimality and outperforms numerous existing feature selection methods in our experiments, thus validating it as a simple but powerful approach for this problem.
translated by 谷歌翻译
Human parsing aims to partition humans in image or video into multiple pixel-level semantic parts. In the last decade, it has gained significantly increased interest in the computer vision community and has been utilized in a broad range of practical applications, from security monitoring, to social media, to visual special effects, just to name a few. Although deep learning-based human parsing solutions have made remarkable achievements, many important concepts, existing challenges, and potential research directions are still confusing. In this survey, we comprehensively review three core sub-tasks: single human parsing, multiple human parsing, and video human parsing, by introducing their respective task settings, background concepts, relevant problems and applications, representative literature, and datasets. We also present quantitative performance comparisons of the reviewed methods on benchmark datasets. Additionally, to promote sustainable development of the community, we put forward a transformer-based human parsing framework, providing a high-performance baseline for follow-up research through universal, concise, and extensible solutions. Finally, we point out a set of under-investigated open issues in this field and suggest new directions for future study. We also provide a regularly updated project page, to continuously track recent developments in this fast-advancing field: https://github.com/soeaver/awesome-human-parsing.
translated by 谷歌翻译
With the development of technology and sharing economy, Airbnb as a famous short-term rental platform, has become the first choice for many young people to select. The issue of Airbnb's pricing has always been a problem worth studying. While the previous studies achieve promising results, there are exists deficiencies to solve. Such as, (1) the feature attributes of rental are not rich enough; (2) the research on rental text information is not deep enough; (3) there are few studies on predicting the rental price combined with the point of interest(POI) around the house. To address the above challenges, we proposes a multi-source information embedding(MSIE) model to predict the rental price of Airbnb. Specifically, we first selects the statistical feature to embed the original rental data. Secondly, we generates the word feature vector and emotional score combination of three different text information to form the text feature embedding. Thirdly, we uses the points of interest(POI) around the rental house information generates a variety of spatial network graphs, and learns the embedding of the network to obtain the spatial feature embedding. Finally, this paper combines the three modules into multi source rental representations, and uses the constructed fully connected neural network to predict the price. The analysis of the experimental results shows the effectiveness of our proposed model.
translated by 谷歌翻译